Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • parallelizing `wyoung` resampling command

    I am attempting to calculate adjusted p-values using the resampling methodology of Westfall and Young (1993). Fortunately, there is a handy and robust package `wyoung` that can perform this: https://github.com/reifjulian/wyoung

    Unfortunately, my data are relatively large and I am running fairly parsimonious regressions so it's taking a very long time. Example:

    Code:
        local yvars "outcome1 outcome2 outcome3 outcome4 outcome5 outcome6 outcome7 outcome8"
        wyoung `yvars', ///
        cmd(reg OUTCOMEVAR  explanatory_1 explanatory_2 explanatory_3  ///
                            explanatory_4 explanatory_5 explanatory_6    ///
                            explanatory_7 explanatory_8 explanatory_9  ///
                            explanatory_10 explanatory_11, vce(clu hhd_index)) ///
        familyp(explanatory_1 explanatory_2 explanatory_3  ///
                explanatory_4 explanatory_5 explanatory_6    ///
                explanatory_7 explanatory_6 explanatory_9  ///
                explanatory_10 explanatory_11) ///
        seed(33) boot(10000) cluster(my_cluster) replace
    The above command takes just over 4 days to run on a slurm cluster on a single node. I want to parallelize this code to decrease runtime. I've tried investigating this `parallel` package (https://github.com/gvegayon/parallel) but I have not successfully adapted it to this `wyoung` process.

    1. Is there a way to parallelize this code using `parallel`?
    2. Is there another means by which I can decrease runtime?
    Last edited by Jack Reimer; 31 Mar 2022, 14:33.

  • #2
    I have not tried using parallel. However, note that the vast majority of the computation time is taken up bythe regress command, which scales nearly perfectly with the number of processors (see Figure 4). Thus I do not think that the parallel package will help, unfortunately, unless your bottleneck is the number of cores in your Stata license. In that case the easiest (but also most expensive) option is to buy a license with more cores.
    Associate Professor of Finance and Economics
    University of Illinois
    www.julianreif.com

    Comment

    Working...
    X